We will analyse NanoString CosMx Spatial Molecular Imaging data from
a tissue slide of high-grade serous ovarian carcinoma
(HGSOC).
Ovarian cancer remains the eighth leading cause of cancer deaths in
women worldwide and HGSOC is the most common and lethal histologic
subtype.
Genomes of HGSOC are highly heterogeneous with most
alterations only found in a small fraction of tumours.
Also, due to a high degree of chromosomal instability,
most HGSOCs are polyclonal. As the cancer progresses and metastasises,
clonal diversity increases, which is associated with
worse prognosis and development of
chemoresistance.
Standard first-line treatment for HGSOC typically consists of debulking surgery, which involves removal of as much of the tumor as possible, followed by intravenous paclitaxel/platinum-based chemotherapy, and often subsequent maintenance therapy.
Investigating tumor heterogeneity, The Cancer Genome Atlas (TCGA)
project revealed four molecular subtypes of HGSOC based
on bulk expression measurements: mesenchymal, proliferative,
immunoreactive, and differentiated. These subtypes are associated with
differences in prognosis.
It is now generally accepted that transcriptional subtypes of HGSOC
largely reflect the degree of immune cell infiltration and the
abundance of fibroblasts, rather than inherent differences in
tumour cells.
To determine how these non-malignant cell types might influence tumour
growth and prognosis, ST data hold the potential to reconstruct
ligand-receptor interactions between stromal, immune and tumour cell
populations, resolving the tumor tissue architecture at spatial
resolution.
In this hackathon we will analyze data of tumour sample collected during interval debulking surgery from HGSOC patient with a good response to taxane- and platinum-based neoadjuvant chemotherapy (NACT) treatment.
Data were retrieved from Denisenko, E. et al., Spatial transcriptomics reveals discrete tumour microenvironments and autocrine loops within ovarian cancer subclones. Nat Commun. (2024)
Before starting, let’s set the stage to be all on the same
ground.
You have been provided with configuration files. Please select “your”
environment and the “helper” functions.
source('../00_environment.R')
source("../00_helper_functions.R")
source("level_2_libraries.R")
To create the SeuratObject representing the ST data of
the sequenced slide we will rely on information stored in the
CosMx folder in input. This folder
contains:
gene expression matrix:
cell (rows) X genes (columns) transcript count matrix.
CosMx count matrix also includes negative probes, which
can be used to estimate signal-to-noise ratio in the resulting
transcript quantification;
polygon file: list of x/y point coordinates that
can be used to reconstruct the single cell boundaries.
This file will not be used in this level, but it is generally useful for
visualization purposes and if you want to estimate cell-to-cell
distances in a more precise way (i.e., membrane-to-membrane
distance);
single cell metadata table:
By default, the CosMx metadata will always include:
| What | Path to |
|---|---|
| gene expression matrix | ../input/CosMx/exprMat_file.csv |
| polygon file | ../input/CosMx/polygons.csv |
| single cell metadata table | ../input/CosMx/metadata_file.csv |
Given paths reported in the table above, load CosMx data into a
SeuratObject.
A SeuratObject is a specialized data structure developed in
the Seurat R package,
enabling the study of gene expression in the context of tissue
architecture.
This SeuratObject is a container that organizes and stores
both the cell-level expression data and the associated image of the
tissue slide.
Create a SeuratObject containing cell-level gene
expression counts and the image of the tissue slide; in order:
Load gene expression count matrix.
The CosMx gene expression matrix comes in a dense csv
format.
We recommend using the read_csv_arrow() function from the
arrow package to speed up loading the csv file into
R.
Be sure that the cell identifiers across the different files
follow the same format.
In our case, create a new cell identifier by merging the numeric
identifier of each single cell cell_ID with the numeric
identifier of its FOV of origin fov.
Once the matrix has been loaded, make sure to have a genes (rows)
X cells (columns) orientation required to work with
Seurat.
If the orientation does not match, you can use the
Matrix::t() function to transpose the whole
matrix.
Assign the new cell identifiers previously generated to column
names.
In order to build your SeuratObject, you will also
need to load the CosMx cell metadata.
cell_ID with the numeric identifier of its FOV of origin
fov.Assess the presence of negative probes in the panel.
Check whether information for all cells is present in both count
matrix and metadata.
Reorder rows of the metadata file according to the order by which cells
are in the expression matrix.
Create a SeuratObject using the main gene expression
count with the function CreateSeuratObject().
As input files, use the count matrix and the metadata files you
manipulated. Specify your assay and
project.
Using the function CreateAssayObject(), generate an
additional assay containing the negative probes counts and add it to the
SeuratObject you built.
Add the image to the SeuratObject.
Create an object retrieving cell centroids using the function
CreateCentroids(). Bear in mind to use as coordinates for
centroids the columns CenterX_global_px and
CenterY_global_px, included in the metadata file.
From the object including centroids, create another object with
the spatial coordinates of your cells using the function
CreateFOV().
Specify type = centroids and the name of your
assay.
Finally, add the spatial object to your main
SeuratObject.
Questions
How many cells are we considering in the assay?
How many genes are included in the panel?
How many FOVs cover the tissue slide?
require(Matrix)
# Load csv file
mat = arrow::read_csv_arrow('../input/CosMx/exprMat_file.csv')
# Create new cell identifiers:
# we will create this common format by merging the numeric identifier of each single cell with the numeric identifier of its FOV of origin.
ID = paste0(mat$fov,'_',mat$cell_ID)
mat$fov = NULL
mat$cell_ID = NULL
# Convert count matrix in a sparse format (easier to manage)
mat = as(as.matrix(mat),'CsparseMatrix')
# Transpose matrix to have genes as rows and cells as columns
mat = Matrix::t(mat)
# Set column name as cell IDs
colnames(mat) = ID
message('N genes = ' , nrow(mat), ' (negative probes included)')
message('N cells = ' , ncol(mat))
meta = arrow::read_csv_arrow('../input/CosMx/metadata_file.csv')
# Create new cell identifier as done before
meta$cell_ID = paste0(meta$fov,'_',meta$cell_ID)
meta$fov = NULL
head(meta)
## # A tibble: 6 × 19
## cell_ID Area AspectRatio CenterX_local_px CenterY_local_px CenterX_global_px
## <chr> <int> <dbl> <int> <int> <dbl>
## 1 1_1 9074 1.97 1757 3612 5501.
## 2 1_2 7494 1.55 4931 3610 8675.
## 3 1_3 6621 1.25 5343 3609 9087.
## 4 1_4 3451 1.09 250 3617 3994.
## 5 1_5 10484 1.23 2758 3596 6502.
## 6 1_6 9509 1.1 2208 3593 5952.
## # ℹ 13 more variables: CenterY_global_px <dbl>, Width <int>, Height <int>,
## # Mean.MembraneStain <int>, Max.MembraneStain <int>, Mean.PanCK <int>,
## # Max.PanCK <int>, Mean.CD45 <int>, Max.CD45 <int>, Mean.CD3 <int>,
## # Max.CD3 <int>, Mean.DAPI <int>, Max.DAPI <int>
# Create a new matrix containing only negative probes by doing a `grep` on row names containing pattern 'NegPrb'
negmat = mat[grepl('NegPrb', rownames(mat)),]
print(paste0('N negative probes (NegPrb) = ', nrow(negmat)))
## [1] "N negative probes (NegPrb) = 19"
# Remove negative probes from counts
mat = mat[!grepl('NegPrb', rownames(mat)),]
message('N genes = ' , nrow(mat), ' (negative probes included)')
message('N cells = ' , ncol(mat))
# Check cell id match between columns in gene expression matrix and metadata files
match_ids = match(colnames(mat), meta$cell_ID)
# Let's count how many usable cells (those whose expression data is matched with annotations in metadata) we are considering
# Of these, check how many are associated to genes or negative probes
if(sum(is.na(match_ids))>0){
message(sum(is.na(match_ids)), ' cells do not have metadata information')
usable_cells = which(!is.na(match_ids))
mat = mat[,usable_cells]
negmat = negmat[,usable_cells]
}
# Reorder metadata
meta = meta[match(colnames(mat), meta$cell_ID),]
rownames(meta) = meta$cell_ID
require(Seurat)
# Create a SeuratObject with main gene expression matrix
cosmx <- CreateSeuratObject(counts = mat
, meta.data = meta
, assay='CosMx'
, project = 'CosMx'
)
# As an additional assay, we add the counts of negative probes in the same format: features (rows) x cells (columns).
cosmx[["negprobes"]] = CreateAssayObject(negmat)
DefaultAssay(cosmx) <- 'CosMx' # The default assay is set to `CosMx`
# Next, we add single-cell spatial coordinates info (i.e., centroid coordinates)
# Prepare centroid coordinates retrieving coordinates from metadata file
centroids <- CreateCentroids(coords = meta %>% select(CenterY_global_px, CenterX_global_px, cell_ID))
# Create an object with the spatial coordinates of cells
coords <- CreateFOV(
coords = centroids, # Spatial coordinates retrieved before
type = "centroids",
molecules = NULL,
assay = "CosMx"
)
# Add the spatial object to the `SeuratObject` and assign 'SP5' sample name to the new `Seurat` spatial data
cosmx[['SP5']] = coords
Now, you have to:
Quantify the capture efficiency per cell defined as
the ratio of number of expressed genes (nFeature) and total
counts (nCount).
Add this metric in the meta.data dataframe included in your
SeuratObject.
Assess the correlation between metrics included in
meta.data generating a pairs plot.
Evaluate how multiple FOVs cover the tissue slide using the
function ImageDimPlot() .
A good experiment will be the one where FOVs are not overlapping each
other. Are there any overlapping FOVs?
Create violin plots resembling QC metrics using Seurat function
VlnPlot().
Add a column in the metadata an additional identifier common to all cells, regardless of FOV.
Visualize QC metrics using violin plots considering all FOVs
together.
HINT: you have to set as currently active identity the newly
added column with the common identifier before using the
VlnPlot() function
Visualize QC features over the HE image exploiting
ImageFeaturePlot().
Remove cells with very few transcript/gene counts or noisy gene
expression profile.
By applying permissive thresholds, we exclude cells with insufficient
gene expression information, which cannot be used for downstream
analysis.
In particular, exclude cells with:
Generally speaking, we recommend to not apply too stringent thresholds on your single cells from the beginning, to avoid altering the interpretation of the cell ecosystem that characterize your CosMx sample.
# Assess the capture efficiency
# the capture efficiency is measured as the ratio nFeature_CosMx/nCount_CosMx per spot.
# We add this annotation by creating a new column directly on the meta.data dataframe
cosmx@meta.data$transcriptome_capture_efficiency = with(cosmx@meta.data, nFeature_CosMx/nCount_CosMx)
cosmx@meta.data$transcriptome_capture_efficiency[which(is.na(cosmx@meta.data$transcriptome_capture_efficiency ))] = 0
# Test the dependencies of the stats;
selected_features = c("nFeature_CosMx","nCount_CosMx", "transcriptome_capture_efficiency" )
# `pairs` creates a matrix of scatter plots for visualizing relationships between the selected features
# we specify to returns the correlation estimate in the panel upper the diagonal with the parameter `upper.panel`.
pairs(cosmx@meta.data[,selected_features]
, lower.panel = panel.smooth
, upper.panel = panel.cor
)
# Image with fluorescence signals localization
FL_image = ImageDimPlot(cosmx, fov = 'SP5', axes = TRUE, dark.background = T)
FL_image
# `VlnPlot` is a function included in Seurat package that creates violin plots to visualize the distribution of features of interest.
# With the `feature` parameter we specifically select to consider the measures included in the vector selected_features.
# On the x axis we have the currently active identity in our `SeuratObject`, e.g. `orig.ident`.
VlnPlot(cosmx, layer = 'counts'
, features = selected_features, pt.size = 0.1, ncol = 1) & theme(axis.title.x = element_blank())
# Compute the mean transcriptome capture efficiency for each FOV
mean_capture_efficiency_fov = ddply(cosmx@meta.data, .(orig.ident), summarise, mean(transcriptome_capture_efficiency))
mean_capture_efficiency_fov
## orig.ident mean(transcriptome_capture_efficiency)
## 1 1 0.5882037
## 2 10 0.6226096
## 3 11 0.6321418
## 4 12 0.6859654
## 5 13 0.6122349
## 6 14 0.6557548
## 7 15 0.5762404
## 8 16 0.5956720
## 9 17 0.5420021
## 10 18 0.6866663
## 11 19 0.5537052
## 12 2 0.7101698
## 13 21 0.5443845
## 14 22 0.5552921
## 15 4 0.5171560
## 16 5 0.5365214
## 17 6 0.5460309
## 18 7 0.5520892
## 19 8 0.6218029
## 20 9 0.5473153
cosmx@meta.data$sample = 'SP5'
Idents(cosmx) = cosmx$sample
VlnPlot(cosmx, layer = 'counts'
, features = selected_features, pt.size = 0.1, ncol = 1) & theme(axis.title.x = element_blank())
Idents(cosmx) = cosmx$orig.ident
# Let's see the statistics on the tissue slide:
# `SpatialDimPlot` and `SpatialFeaturePlot` are two main functions to plot features considering spatial information.
# Image with features plotted
FS_image = ImageFeaturePlot(cosmx, fov = 'SP5', features = selected_features, cols = viridis::viridis(n = 2),
, scale='feature', combine = T)
patchwork::wrap_plots(FL_image, FS_image, ncol = 1, nrow = 2)